Gamera: A Python-based Toolkit for Structured Document Recognition
نویسندگان
چکیده
This paper presents Gamera, a new toolkit for the creation of domain-specific structured document recognition applications by domain experts with limited programming experience. The goal of the Gamera system is to leverage the user’s knowledge of the target documents to create custom applications rather than attempting to meet the needs of diverse users with a monolithic application. The system allows a knowledgeable user to combine image processing and recognition tools in an intuitive, interactive, graphical scripting environment based on Python. The use of Python in Gamera creates a simple yet powerful and flexible programming environment for novice programmers. Additionally, the resulting applications are suitable for a large-scale digitization project because they can be run in a batch-processing mode and easily integrated into a digitization framework. Finally, the Python module system has been extended to allow the easy creation of plugins using Python or C++.
منابع مشابه
Gamera: Optical music recognition in a new shell
An optical music recognition system has been completely overhauled and reformatted into a new framework called Gamera. The new open-source software is not only designed to recognize various music notations, including handwritten scores, but can be used to develop systems that can recognize many other structured documents. Gamera is intended to be used by domain experts with particular knowledge...
متن کاملA Multiple-Choice Test Recognition System based on the Gamera Framework
This article describes JECT-OMR, a system that analyzes digital images representing scans of multiple-choice tests compiled by students. The system performs a structural analysis of the document in order to get the chosen answer for each question, and it also contains a bar-code decoder, used for the identification of additional information encoded in the document. JECT-OMR was implemented usin...
متن کاملUsing the Gamera Framework for Building a Lute Tablature Recognition System
In this article we describe an optical recognition system for historic lute tablature prints that we have built with the aid of the Gamera toolkit for document analysis and recognition. We give recognition rates for various historic sources and show that our system works quite well on printed tablature sources using movable types. For engraved and manuscript sources, we discuss some principal c...
متن کاملThe Gamera framework for building custom recognition systems
This paper describes the Gamera framework for building custom document recognition systems. This open-source system is designed to support the testand-refine development cycle: an important style for developing recognition systems that work with difficult historical documents, since the solutions are often non-obvious. This paper explains the overall architecture of the system, in addition to d...
متن کاملTranskribus Python Toolkit
This paper introduces an open source Python toolkit for the Transkribus platform. One part of the toolkit offers a Python client for the Transkribus RESTful interface. The second part offers various Document Understanding tools. The open-source toolkit is freely available through GitHub. Keywords—Transkribus platform, RESTful client, Document Understanding, Conditional Random Fields, Sequential...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001